Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASR
نویسندگان
چکیده
Vector Taylor Series (VTS) is a powerful technique for robust ASR but, in its standard form, it can only be applied to log-filter bank and MFCC features. In earlier work, we presented a generalised VTS (gVTS) that extends the applicability of VTS to front-ends which employ a power transformation non-linearity. gVTS was shown to provide performance improvements in both clean and additive noise conditions. This paper makes two novel contributions. Firstly, while the previous gVTS formulation assumed that noise was purely additive, we now derive gVTS formulae for the case of speech in the presence of both additive noise and channel distortion. Second, we propose a novel iterative method for estimating the channel distortion which utilises gVTS itself and converges after a few iterations. Since the new gVTS blindly assumes the existence of both additive noise and channel effects, it is important not to introduce extra distortion when either are absent. Experimental results conducted on LVCSR Aurora-4 database show that the new formulation passes this test. In the presence of channel noise only, it provides relative WER reductions of up to 30% and 26%, compared with previous gVTS and multi-style training with cepstral mean normalisation, respectively.
منابع مشابه
Use of Generalised Nonlinearity in Vector Taylor Series Noise Compensation for Robust Speech Recognition
Designing good normalisation to counter the effect of environmental distortions is one of the major challenges for automatic speech recognition (ASR). The Vector Taylor series (VTS) method is a powerful and mathematically well principled technique that can be applied to both the feature and model domains to compensate for both additive and convolutional noises. One of the limitations of this ap...
متن کاملA unified framework of HMM adaptation with joint compensation of additive and convolutive distortions
In this paper, we present our recent development of a model-domain environment-robust adaptation algorithm, which demonstrates high performance in the standard Aurora 2 speech recognition task. The algorithm consists of two main steps. First, the noise and channel parameters are estimated using multi-sources of information including a nonlinear environment distortion model in the cepstral domai...
متن کاملDeep Neural Network-Based Noise Estimation for Robust ASR in Dual-Microphone Smartphones
The performance of many noise-robust automatic speech recognition (ASR) methods, such as vector Taylor series (VTS) feature compensation, heavily depends on an estimation of the noise that contaminates speech. Therefore, providing accurate noise estimates for this kind of methods is crucial as well as a challenge. In this paper we investigate the use of deep neural networks (DNNs) to perform no...
متن کاملAn HMM Compensation Approach Using Unscented Transformation for Noisy Speech Recognition
The performance of current HMM-based automatic speech recognition (ASR) systems degrade significantly in real-world applications where there exist mismatches between training and testing conditions caused by factors such as mismatched signal capturing and transmission channels and additive environmental noises. Among many approaches proposed previously to cope with the above robust ASR problem,...
متن کاملCVX-Optimized Beamforming and Vector Taylor Series Compensation with German ASR Employing Star-Shaped Microphone Array
This paper addresses the problem of distant speech recognition in reverberant noisy conditions employing a star-shaped microphone array and vector Taylor series (VTS) compensation. First, a beamformer yields an enhanced single-channel signal by applying convex (CVX) optimization over three spatial dimensions given the spatio-temporal position of the target speaker as prior knowledge. Then, VTS ...
متن کامل